Ce rapport contient la description et l'explicabilité d'un modèle de prédiction de la solution envisagée par des sondés face aux problématique du changement climatique
Version : 0.1
Name : Barometre Changement Climatique
Purpose : Prédire la solution au changement climatique envisagée par les sondés
Date : 2023-11-22
Contributors : Thomas Bouche
Description : Un sondage de l ADEME Et OpinionWay demande à des sondés leur avis sur la solution face aux changements climatiques. Nous faisons un modèle pour prédire la solution répondue grâces aux autres questions et caractéristiques du sondé
Git Commit : None
Origin : Data.gouv.fr
Description : Depuis 2000, l’ADEME a initié un baromètre sur les représentations sociales du changement climatique. Cette enquête permet de dresser un tableau et de mesurer les évolutions de la place de l’environnement dans les préoccupations des Français, les représentations des causes et conséquences du changement climatique, l’opinion sur les solutions et mesures de politiques publiques ainsi que l’engagement individuel. Vous trouverez ici la base de données de l’ensemble des vagues de ce baromètre depuis l’année 2000.
Depth : De 2000 à 2022
Perimeter : les vagues les plus complètes sur les 10 dernières années
Target Variable : Solutions
Target Description : Solutions face au changement climatique
Variable Filetring : Conservation des variables qui semblent clés (une quarantaine)
Individual Filtering : Suppression des individus avec trop de valeurs à vides
Missing Values : à revoir
Feature Engineering : reconstitution d une variable qui correspond à la moyenne des actions que le sondé envisage possible de faire
Path To Script : https://github.com/ThomasBouche/sensibilisation_explicabilite
Used Algorithm : catboost.
Parameters Choice : paramètres par défaut
Metrics : auc
Validation Strategy : We splitted our data into train (75%) and test (25%)
Path To Script : https://github.com/ThomasBouche/sensibilisation_explicabilite
Model used : CatBoostClassifier
Library : catboost.core
Library version : 1.2.2
Model parameters :
| Parameter key | Parameter value |
|---|---|
| _object | <_catboost._CatBoost object at 0x7f4af4b633d0> |
| _init_params | {'max_depth': 5, 'scale_pos_weight': 1.2} |
| _is_fitted_ | True |
| _random_seed | 0 |
| Parameter key | Parameter value |
|---|---|
| _learning_rate | 0.036913998425006866 |
| _tree_count | 1000 |
| _n_features_in | 0 |
| _prediction_values_change | [2.0346833081487796, 3.120371255684199, 6.57377260598391, 2.567577368551938, 2.3927540534418066, 3.1781498563106574, 2.021230804942296, 6.668565883975401, 2.646171328293313, 2.608325998126067, 2.5538167033696104, 2.2766670921713823, 2.019612015813633, 2.038943666918077, 1.7562231349278974, 2.8401239217163625,... |
| Training dataset | Prediction dataset | |
|---|---|---|
| number of features | 31 | 31 |
| number of observations | 19,862 | 6,621 |
| missing values | 0 | 0 |
| % missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 3 | 3 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 3 | 3 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 8 | 8 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 7 | 7 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 4 | 4 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 2 | 2 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 3 | 3 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 4 | 4 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 4 | 4 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 7 | 7 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| count | 19,862 | 6,621 |
| mean | 2.41 | 2.4 |
| std | 1.39 | 1.4 |
| min | 0 | 0 |
| 25% | 2 | 2 |
| 50% | 3 | 3 |
| 75% | 3.36 | 3.36 |
| max | 4 | 4 |
| Training dataset | Prediction dataset | |
|---|---|---|
| count | 19,862 | 6,621 |
| mean | 1.85 | 1.82 |
| std | 1.39 | 1.39 |
| min | 0 | 0 |
| 25% | 0 | 0 |
| 50% | 2.42 | 2.4 |
| 75% | 3 | 3 |
| max | 4 | 4 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 6 | 6 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 15 | 15 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 10 | 10 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 5 | 5 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 6 | 6 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 4 | 4 |
| missing values | 0 | 0 |
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 3 | 3 |
| missing values | 0 | 0 |
INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting. INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
| Training dataset | Prediction dataset | |
|---|---|---|
| distinct values | 2 | 2 |
| missing values | 0 | 0 |
Note : the explainability graphs were generated using the test set only.
INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting. INFO:matplotlib.category:Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
| True values | Prediction values | |
|---|---|---|
| distinct values | 2 | 2 |
| missing values | 0 | 0 |
AUC : 0.635